Bulletin of Electrical Engineering and Informatics 
Vol. 12, No. 1, February 2023, pp. 268~274 
ISSN: 2302-9285, DOI: 10.1159 1/eei.v12i1.4395 o 268 


Arabic vowels characterization and classification using the 
normalized energy in frequency bands 


Mohamed Farchi, Karim Tahiry, Ahmed Mouhsen 


IMII Laboratory, Faculty of Sciences and Technics, Hassan First University of Settat, Settat, Morocco 


Article Info ABSTRACT 

Article history: The main objective of this work is to conduct an acoustic study of Arabic 
vowels (/a/, /a:/, /u/, /u:/, /i/ and /i:/) in order to determine the most relevant 

Received Jun 27, 2022 characteristics that allow recognizing these vowels. The analysis of vowel 

Revised Sep 15, 2022 spectrograms reveals that the energy distribution as a function of time and 

Accepted Oct 5, 2022 frequency clearly differs according to the considered vowel. Thus, we used 


the normalized energy in frequency bands to classify these vowels. 

Thereafter, we have exploited the obtained results to develop algorithms that 
Keywords: allow the classification of vowels and the distinction of the long vowels from 
the short ones. The efficiency of these algorithms was evaluated by testing 
their performances on our Arabic corpus. 


Arabic vowels 
Classification algorithm 
Normalized energy bands 


Vowel nucleus 
This is an open access article under the CC BY-SA license. 


Corresponding Author: 


Mohamed Farchi 

IMII Laboratory, Faculty of Sciences and Technics, University Hassan First 
Settat, Morocco 

Email: simo.farchi @ gmail.com 


1. INTRODUCTION 

Sounds are broadly classified into voiced and unvoiced speech. Vowels are spoken sounds that 
generate acoustically filtered, quasi-periodic air pulses as they flow through the vocal tract. Vowels and 
consonants differ primarily in that vowels resonate in the throat [1], [2]. As well, the vowels are sounds 
produced without constriction in vocal tract unlike other types of sounds which are all produced by narrowing 
the flow of air through the vocal tract. However, the vowels are articulated by lifting a part of the tongue body, 
the location of the vowel refers to the part of the tongue that is highest in its production [3]. 

Vowel identification is crucial in the process of continuous speech recognition. Hence, efforts were 
developed in analyzing and characterizing vowels. Despite the fact that analyzing speech signals in the 
frequency domain is extremely important in studying its acoustic properties, formants remain the classical way 
of classifying vowels. Especially, the two first formants are the most significant acoustic parameters that 
determine vowels. Some researchers have exploited formant frequencies to develop algorithms for identification 
and classification of vowels in continuous speech [1]-[4]. Other researchers have developed a recognition 
system based on the frequencies of the first and second formants (F1 and F2) [5]-[9]. They reported that the 
formants of the long vowels are peripheral to those of the short ones. Natour et al. [10] have found that the front 
vowel /i/ has a high frequency for F2, while the back vowel /u/ has a very low F2. The F2 values for the central 
vowel /a/ are between these two extremes. Several additional cues that distinguish vowels have also been found 
in earlier studies. Alotaibi and Hussain [5] showed that the duration is very important to distinguish between 
short and long vowels. Others researchers have shown a relationship between speech rate and vowels duration. 
So, if the speech rate increases the vowel duration becomes short [11]. More recently, researchers have 
indicated that vowel duration is a hugely important acoustic cue for vowel identification and speech 
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intelligibility [12], [13]. Moreover, Khattab and Al-Tamimi [14] have found no significant difference between 
males and females regarding to the durational results. Other investigators have developed a method based on the 
wavelet transform and spectral analysis for speech consonant and vowel segmentation in Arabic language 
without linguistic information [15]. They have reported that there are a significant difference among the long 
and short vowels in both quantity and quality [16]. 

The literature review shows that the different vowel classification methods (/a/,/u/and/i/) used are 
effective. However, the distinction between short and long vowels remains difficult to implement. Indeed, 
researchers report that long vowels have a longer production time than short vowels [17], [18] and long vowel 
formants are peripheral to those of short ones [19]. However, the variability of the speech signal makes the 
determination of the production time and the ranges of the formants of a short or long vowel very difficult. In 
our previous study [20], have found that both the formant frequencies and the normalized energy bands can 
differentiate between short Arabic vowels. Additionally, they have discovered that the spectral moments of 
long vowels (CoG and STD) show that their generation occurs in two stages: a transient phase at the start of 
vowel formation and a steady state phase as duration lengthens. Furthermore the difference between the short 
vowels and long ones is the fact that the equilibrium position is maintained longer in the production of a long 
vowel and the rate of change of the formants or the normalized energy (percent of energy) in the frequency 
bands can be a good indication for distinguishing between the long vowels and those short. According to [21] 
and [22], the spectral moments enable a more thorough definition of the vowel category. Korkko [23] used 
spectral moments to examine how young children produced the consonant /s/ in contexts with symmetrical 
vowels (such as /isi, usu, ysy,/s/). His research shows how vowel co-articulation has a considerable impact on 
its spectral properties. 

The main goal of this research is to add to the body of knowledge regarding Arabic vowels in 
experimental literature. The conclusions reported here are based on an acoustic analysis of Arabic vowels and 
the creation of an algorithm for classifying long and short vowels. The main objective is to detect the Arabic 
vowels using the bands' normalized energy. 

The structure of this paper is as follows. The procedures used, the equipment used, and the 
experiments conducted are described in the first section. The results are presented and discussed in the second 
section. A summary of the results and a discussion of the conclusions are presented in the final section. 


2. METHOD 
2.1. General processing 

A set of experiments were conducted with the aim to describe how Arabic vowels behave. The 
methodology for these studies is outlined in this section, together with information on how the data was 
gathered and the tools that were employed. A corpus of Arabic language with short and long vowels (/a/, /a:/, 
lu/, /u:/, /i/, and /i:/) was created. Twenty Moroccan speakers between the ages of 20 and 40 were asked to 
repeat the isolated syllables CV, which contain both short and long vowels. Isolated syllables rather than 
words were selected to reduce the influence of other phonemes on the vowels under study. Additionally, the 
vowel's length can be freely increased. Since producing it involves little strain on the vocal tract, the 
consonant /?/: /s/ was chosen for the entire corpus (Table 1). The speech was separated into 11.6 ms time 
segments with a 9.6 ms overlap and sampled at 22,050 Hz. After Hamming windowing and zero-padding 
each segment, a 512-point fast fourier transform was computed. 


Table 1. Arabic corpus of long and short vowels 
Vowel/a, - / Vowel /i, - / Vowel /u, +/ 
NM: Pal Kis] :/2i/ /3f : /2u/ 
// : /2aa/ ef : Ril [ssl : /2uu/ 
NO : /aaa/ Iel : /2iii/ [333l : /Pauu/ 
Mo: /aaaal eat: iii  /3333/ : /uuuu/ 


2.2. Computation of energy band 

The magnitude spectrum was smoothed out per frame along the time index n using a 20-point 
moving average. Six distinct frequency bands (band 1: 0-400 Hz; band 2: 400-800 Hz; band 3: 800-1200 Hz; 
band 4: 1200-2000 Hz; band 5: 2000-3500 Hz and band 6: 3500-5000 Hz) were chosen from the smoothed 
spectrum X(n,k). The energy in each band was calculated as (1): 


Eb(n)=Xx 10 logyo( |X (n, k)|*) (1) 
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where the band index b ranges from 1 to 6. The frequency index k ranges from the DFT indices representing 
the lowerand upper boundaries for each band. Then, the normalized energy band for each frame was 
determined by: 


Ebn(n) -2% (2) 


Er(n) 


where Ebn (n) denotes the normalized band energy b in the frame n, ET (n) is the frame's overall energy, and 
Eb (n) denotes the band energy b in the frame n. 

The vowel is composed of three segments, the onset is the first segment, the closing segment is the 
coda and the nucleus is the central segment of the vowel. The energy in vowel nucleus was calculated as follow: 


2 
“dy 

Enucteus =D, Epn (n) GB) 
gt 


Where Enucleus is the normalized band energy in vowel nucleus, dv is the vowel overall duration. 


3. RESULTS 
3.1. Band energy 

The first purpose of this part is to study the energydistribution of vowels (/a/, /i/ and /u/) in the predefined 
frequencybands (B1, B2, B3, B4, B5, and B6) according to production duration. The Tables 2-4 summarize the 
obtained results. We can notice that all vowelshave a significant energy in the first band since they arevoiced 
sounds. Additionally, the percentage distribution of energybands is unaffected by the production duration. 

We can also see that for /a/, more than 70% of the total energy of the vowel is concentrated in bands 
B1 and B2 in equal parts (~35%). On the other hand, for /u/, more than 70% of its energy is located in band 
B1 and (=20%) in band B2. For the vowel /i/, more than 70% of its energy is concentrated in the band B1 
against (~20%) in the band BS. 


Table 2. Distribution of the energy percentage in the bands according to the production time of the vowel /a/ 
Production duration (s)_B1(%) B2(%)  B3(%)  B4(%) B5(%) B6(%) 


0,316 43,16 41,81 8,19 4,89 1,54 0,37 
0,636 37,91 40,06 9,91 6,80 4,26 1,02 
0,867 40,97 42,02 8,25 5,71 2,62 0,39 
0,983 36,43 44,81 10,62 4,64 2,09 0,37 


Table 3. Distribution of the energy percentage in the bands according to the production time of the vowel /i/ 
Production duration (s) _B1(%) B2(%) B3(%) B4 (%) B5 (%) B6 (%) 


0,370 76,07 2,71 0,009 0,006 19,21 1,95 
0,651 76,46 3,32 0,008 0,007 18,4 1,72 
0,720 76,36 2,99 0,008 0,007 18,9 1,66 
0,967 71,32 2,89 0,010 0,008 23,6 2,08 


Table 4. Distribution of the energy percentage in the bands according to the production time of the vowel /u/ 
Production duration (s) B1 (%) B2 (%) B3 (%) B4 (%) B5 (%) B6 (%) 


0,431 87,73 11,64 0,52 0,01 0,05 0,03 
0,558 86,98 12,46 0,47 0,00 0,03 0,02 
0,639 80,80 17,95 1,13 0,03 0,05 0,00 
0,948 82,57 15,38 190 0,03 0,09 0,00 


The examination of the bands energy distribution of the vowels /a/, /i/, /u/ (see Figures 1-3) reveals 
two phases: 
— A transient phase which represents the beginning of vowel production and characterized by large 
changes in values of normalized energy bands. 
— A steady phase when increasing vowel production time, where normalized energy in the bands (B1, B2, 
B3 and B5) represents significant variations. These results are consistent with those of [20]. 
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% voyelle /a,-/ 


Figure 1. The bands energy distribution of the vowels /a/ 


% voyelle /i,-/ 


B6% 


Figure 2. The bands energy distribution of the vowels /i/ 


voyelle /u,ż / 


0 0,2 0,4 0,6 0,8 1 
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Figure 3. The bands energy distribution of the vowels /i/ 


3.2. Algorithm 

Based on the results obtained for the energy distribution in the predefined frequency bandsof vowels 
(/a/, /i/ and /u/) according to production duration, we have developedan algorithm which allows the 
classification of vowels. This algorithm consists of two parts: the first part is used to recognize the vowel /a, 
a:/, /i, 1:/ or /u, u:/ and the second part is used to decide if this vowel is short or long. 

Vowel recognition: by analyzing the results of the energy distribution in the six frequency bands 
[20], it can be seen that the normalized energy in the B1 band distinguishes the vowel /a, a:/. The normalized 
energies of the B2 and B5 bands make it possible to distinguish /u, u:/ from /i, i:/. Figure 4 shows the 
algorithm that makes this classification possible. 
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To classify the long vowels from the short ones, we have calculated the average rate of change of 
normalized energy in the bands (B1, B2, B3, and B5) in the vowel nucleus. If this average is less than three dB, 
it is a long vowel otherwise it is a short vowel. To determine the vowel nucleus, the vowel is divided into three 
parts of the same length (each part represents 1/3 of the total length of the vowel): the first third is the beginning 
of the vowel (onset), the second third is the nucleus of the vowel (nucleus) and the last third is the end of the 
vowel (offset). The algorithm that distinguishes the short vowel from the long one is given in Figure 5. 


Normalized energies Epn 
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Figure 4. Flowchart of Arabic vowels classification /a/, /u/ and /i/ 
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Figure 5. Flowchart of distinction between short and long vowel 


3.3. Algorithm performance evaluation 

Our algorithm was implemented in MATLAB and tested using the data from our corpus to 
determine its performance. The number of all short and long vowels in this experiment is 1200. 1167 vowels 
are accurately classified according to the data, giving the classification process a total accuracy of 97.25%. 
Table 5 offers more thorough outcomes. As we can see, there were a relatively high number of errors made 
when identifying the letters "u" and "i". 

For the distinction between the long and short vowel, we conducted classification tests on our corpus: 
300 short vowels and 900 long vowels (400 records for each vowel: 100 short and 300 long). The classification 
results are summarized in the Tables 6-8. We can see that this algorithm allows a correct classification of 1136 
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vowels, hence a recognition rate of 94%. The overall recognition rate of the six short and long vowels is 92%. 
These results are very competitive compared to those reported in the literature [17], [19], [24], [25]. 


Table 5. Confusion matrix of vowels 
Jal /u/__fi/__Overallpercentage of errors (%) 


lal 396 3 1 0.33 
ful 0 387 13 1.08 
hil 2 14 384 1.33 
Table 6. Confusion matrix of short Table 7. Confusion matrix of short Table 8. Confusion matrix of 
and long vowels of /a/ and long vowels of /u/ short and long vowels of /i/ 

lal /a:/ ful fu: äl fil 

Jal 91 9 ful 89 11 fil 82 12 

/a:/ 10 290 /u:/ 7 293 fi) 15 285 


4. CONCLUSION 

The main contribution of this paper is the development of an algorithm that allow to recognize each 
Arabic vowel (/a/, /a:/, /u/, /u:/, /i/ and /i:/). Based on the fact that the energy distribution over time and 
frequency of each sound depends on the articulator used and the place and manner of production, we 
conducted an acoustic study based the energy percentage in the bands (band 1: 0-400 Hz; band 2: 400-800 Hz; 
band 3:800-—1200 Hz; band 4: 1200-2000 Hz; band 5:2000-—3500 Hz and band 6: 3500-5000 Hz). The results 
demonstrate that each vowel can be classified using the normalized energy in the frequency bands. The 
algorithms proposed in this work use these indices to recognize each vowel. The performance tests of these 
algorithms on our Arabic corpus show recognition rates of 92% for the vowels. 

As perspectives of this research work, several axes can be explored. We can explore the deportation 
of those algorithms on platforms and test their robustness in a noisy environment. Searches can also be 
oriented towards the characterization of other phonemes. 
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